Do stworzenia raportu wykorzystano następujące biblioteki:
library(knitr)
library(dplyr)
library(ggplot2)
library(skimr)
library(plotly)
library(caret)
library(ggeasy)
W celu uzyskania powtarzalności rezultatów ustawiono ziarno dla generatora liczb pseudolosowych.
set.seed(23)
Poniżej znajduje się kod odpowiedzialny za wczytanie danych z plików.
inventories <- read.csv("rebrickable/inventories.csv")
inventory_parts <- read.csv("rebrickable/inventory_parts.csv")
parts <- read.csv("rebrickable/parts.csv")
part_categories <- read.csv("rebrickable/part_categories.csv")
part_relationships <- read.csv("rebrickable/part_relationships.csv")
elements <- read.csv("rebrickable/elements.csv")
colors <- read.csv("rebrickable/colors.csv")
inventory_minifigs <- read.csv("rebrickable/inventory_minifigs.csv")
minifigs <- read.csv("rebrickable/minifigs.csv")
inventory_sets <- read.csv("rebrickable/inventory_sets.csv")
sets <- read.csv("rebrickable/sets.csv")
themes <- read.csv("rebrickable/themes.csv")
Cały zbiór danych dotyczący klocków LEGO składa się z 12 tabel. Poniżej znajdują się sekcje zawierające opisy każdej z nich, wyświetlające informacje o ich rozmiarze oraz podstawowe statystyki.
Jest to tabela nadrzędna łącząca części i figurki z zestawami LEGO.
Tabela zawiera 37265 rekordów.
Atrybuty w tej tabeli to:
| id | version | set_num | |
|---|---|---|---|
| Min. : 1 | Min. : 1.000 | Length:37265 | |
| 1st Qu.: 14424 | 1st Qu.: 1.000 | Class :character | |
| Median : 54379 | Median : 1.000 | Mode :character | |
| Mean : 61104 | Mean : 1.091 | NA | |
| 3rd Qu.: 88842 | 3rd Qu.: 1.000 | NA | |
| Max. :194312 | Max. :16.000 | NA |
| id | version | set_num |
|---|---|---|
| 1 | 1 | 7922-1 |
| 3 | 1 | 3931-1 |
| 4 | 1 | 6942-1 |
| 15 | 1 | 5158-1 |
| 16 | 1 | 903-1 |
| 17 | 1 | 850950-1 |
| Name | inventories |
| Number of rows | 37265 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 3 | 20 | 0 | 35644 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 61103.60 | 51380.10 | 1 | 14424 | 54379 | 88842 | 194312 | ▇▆▂▂▂ |
| version | 0 | 1 | 1.09 | 0.58 | 1 | 1 | 1 | 1 | 16 | ▇▁▁▁▁ |
W tej tabeli zostały zawarte informacje o zapasach (nakładach) części
LEGO.
Tabela zawiera 1180987 rekordów.
Atrybuty w tej tabeli to:
| inventory_id | part_num | color_id | quantity | is_spare | img_url | |
|---|---|---|---|---|---|---|
| Min. : 1 | Length:1180987 | Min. : -1.0 | Min. : 1.00 | Length:1180987 | Length:1180987 | |
| 1st Qu.: 9404 | Class :character | 1st Qu.: 4.0 | 1st Qu.: 1.00 | Class :character | Class :character | |
| Median : 22838 | Mode :character | Median : 15.0 | Median : 2.00 | Mode :character | Mode :character | |
| Mean : 50849 | NA | Mean : 131.8 | Mean : 3.37 | NA | NA | |
| 3rd Qu.: 87088 | NA | 3rd Qu.: 71.0 | 3rd Qu.: 4.00 | NA | NA | |
| Max. :194312 | NA | Max. :9999.0 | Max. :3064.00 | NA | NA |
| inventory_id | part_num | color_id | quantity | is_spare | img_url |
|---|---|---|---|---|---|
| 1 | 48379c01 | 72 | 1 | f | https://cdn.rebrickable.com/media/parts/photos/1/48379c01-1-e7daa845-2671-4737-8642-3b1574308155.jpg |
| 1 | 48395 | 7 | 1 | f | https://cdn.rebrickable.com/media/parts/photos/7/48395-7-b9152acf-2fa5-4836-a04d-5b7fd39c2406.jpg |
| 1 | stickerupn0077 | 9999 | 1 | f | |
| 1 | upn0342 | 0 | 1 | f | |
| 1 | upn0350 | 25 | 1 | f | |
| 3 | 2343 | 47 | 1 | f | https://cdn.rebrickable.com/media/parts/elements/3000240.jpg |
| Name | inventory_parts |
| Number of rows | 1180987 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 1 | 20 | 0 | 51051 | 0 |
| is_spare | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| img_url | 0 | 1 | 0 | 117 | 8180 | 74266 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 50849.46 | 55136.94 | 1 | 9404 | 22838 | 87088 | 194312 | ▇▂▁▂▁ |
| color_id | 0 | 1 | 131.78 | 862.38 | -1 | 4 | 15 | 71 | 9999 | ▇▁▁▁▁ |
| quantity | 0 | 1 | 3.37 | 9.95 | 1 | 1 | 2 | 4 | 3064 | ▇▁▁▁▁ |
Tabela parts zawiera informacje o częściach LEGO,
które mogą składać się z kilku elementów.
Tabela zawiera
52615 rekordów.
Atrybuty w tej tabeli to:
| part_num | name | part_cat_id | part_material | |
|---|---|---|---|---|
| Length:52615 | Length:52615 | Min. : 1.00 | Length:52615 | |
| Class :character | Class :character | 1st Qu.:17.00 | Class :character | |
| Mode :character | Mode :character | Median :41.00 | Mode :character | |
| NA | NA | Mean :38.91 | NA | |
| NA | NA | 3rd Qu.:60.00 | NA | |
| NA | NA | Max. :68.00 | NA |
| part_num | name | part_cat_id | part_material |
|---|---|---|---|
| 003381 | Sticker Sheet for Set 663-1 | 58 | Plastic |
| 003383 | Sticker Sheet for Sets 618-1, 628-2 | 58 | Plastic |
| 003402 | Sticker Sheet for Sets 310-3, 311-1, 312-3 | 58 | Plastic |
| 003429 | Sticker Sheet for Set 1550-1 | 58 | Plastic |
| 003432 | Sticker Sheet for Sets 357-1, 355-1, 940-1 | 58 | Plastic |
| 003434 | Sticker Sheet for Set 575-2, 653-1, 460-1 | 58 | Plastic |
| Name | parts |
| Number of rows | 52615 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 1 | 20 | 0 | 52615 | 0 |
| name | 0 | 1 | 3 | 222 | 0 | 52103 | 0 |
| part_material | 0 | 1 | 4 | 16 | 0 | 7 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| part_cat_id | 0 | 1 | 38.91 | 22.08 | 1 | 17 | 41 | 60 | 68 | ▃▃▂▁▇ |
Tabela part_categories zawiera informacje o
kategoriach części LEGO.
Tabela zawiera 66
rekordów.
Atrybuty w tej tabeli to:
| id | name | |
|---|---|---|
| Min. : 1.00 | Length:66 | |
| 1st Qu.:19.25 | Class :character | |
| Median :35.50 | Mode :character | |
| Mean :35.36 | NA | |
| 3rd Qu.:51.75 | NA | |
| Max. :68.00 | NA |
| id | name |
|---|---|
| 1 | Baseplates |
| 3 | Bricks Sloped |
| 4 | Duplo, Quatro and Primo |
| 5 | Bricks Special |
| 6 | Bricks Wedged |
| 7 | Containers |
| Name | part_categories |
| Number of rows | 66 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 4 | 44 | 0 | 66 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 35.36 | 19.41 | 1 | 19.25 | 35.5 | 51.75 | 68 | ▇▇▇▇▇ |
W tej tabeli zostały zawarte informacje o relacjach między
poszczególnymi częściami.
Tabela zawiera 29977
rekordów.
Atrybuty w tej tabeli to:
| rel_type | child_part_num | parent_part_num | |
|---|---|---|---|
| Length:29977 | Length:29977 | Length:29977 | |
| Class :character | Class :character | Class :character | |
| Mode :character | Mode :character | Mode :character |
| rel_type | child_part_num | parent_part_num |
|---|---|---|
| P | 3626cpr3662 | 3626c |
| P | 87079pr9974 | 87079 |
| P | 3960pr9971 | 3960 |
| R | 98653pr0003 | 98086pr0003 |
| R | 98653pr0003 | 98088pat0003 |
| R | 98653pr0003 | 98089pat0003 |
| Name | part_relationships |
| Number of rows | 29977 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| rel_type | 0 | 1 | 1 | 1 | 0 | 6 | 0 |
| child_part_num | 0 | 1 | 1 | 20 | 0 | 27139 | 0 |
| parent_part_num | 0 | 1 | 1 | 19 | 0 | 4725 | 0 |
W tabeli elements znajdują się informacje o
pojedynczych klockach LEGO.
Tabela zawiera 84138
rekordów.
Atrybuty w tej tabeli to:
| element_id | part_num | color_id | design_id | |
|---|---|---|---|---|
| Min. : 9327 | Length:84138 | Min. : -1.0 | Min. : 1001 | |
| 1st Qu.: 4259774 | Class :character | 1st Qu.: 8.0 | 1st Qu.: 18454 | |
| Median : 6057754 | Mode :character | Median : 28.0 | Median : 41748 | |
| Mean : 5222065 | NA | Mean : 539.7 | Mean : 45570 | |
| 3rd Qu.: 6262024 | NA | 3rd Qu.: 135.0 | 3rd Qu.: 75474 | |
| Max. :61532443 | NA | Max. :9999.0 | Max. :107520 | |
| NA | NA | NA | NA’s :23682 |
| element_id | part_num | color_id | design_id |
|---|---|---|---|
| 6443403 | 2277c01pr0009 | 1 | 2277 |
| 6300211 | 67906c01 | 14 | 67908 |
| 4566309 | 2564 | 0 | 2564 |
| 4275423 | 53657 | 1004 | 53657 |
| 6194308 | 92926 | 71 | 28967 |
| 6229123 | 26561 | 4 | 26561 |
| Name | elements |
| Number of rows | 84138 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 2 | 19 | 0 | 33765 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| element_id | 0 | 1.00 | 5222065.12 | 1596842.63 | 9327 | 4259773.50 | 6057754 | 6262024.5 | 61532443 | ▇▁▁▁▁ |
| color_id | 0 | 1.00 | 539.67 | 2044.86 | -1 | 8.00 | 28 | 135.0 | 9999 | ▇▁▁▁▁ |
| design_id | 23682 | 0.72 | 45569.87 | 30750.66 | 1001 | 18453.75 | 41748 | 75474.5 | 107520 | ▇▆▅▅▃ |
Tabela colors zawiera informacje o oficjalnych
kolorach klocków LEGO.
Tabela zawiera 263
rekordy.
Atrybuty w tej tabeli to:
| id | name | rgb | is_trans | |
|---|---|---|---|---|
| Min. : -1.0 | Length:263 | Length:263 | Length:263 | |
| 1st Qu.: 83.0 | Class :character | Class :character | Class :character | |
| Median :1005.0 | Mode :character | Mode :character | Mode :character | |
| Mean : 651.4 | NA | NA | NA | |
| 3rd Qu.:1070.5 | NA | NA | NA | |
| Max. :9999.0 | NA | NA | NA |
| id | name | rgb | is_trans |
|---|---|---|---|
| -1 | [Unknown] | 0033B2 | f |
| 0 | Black | 05131D | f |
| 1 | Blue | 0055BF | f |
| 2 | Green | 237841 | f |
| 3 | Dark Turquoise | 008F9B | f |
| 4 | Red | C91A09 | f |
| Name | colors |
| Number of rows | 263 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 3 | 28 | 0 | 263 | 0 |
| rgb | 0 | 1 | 6 | 6 | 0 | 223 | 0 |
| is_trans | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 651.38 | 750.55 | -1 | 83 | 1005 | 1070.5 | 9999 | ▇▁▁▁▁ |
W tej tabeli zawarte zostały informacje o zapasach (nakładzie)
figurek LEGO.
Tabela zawiera 20858 rekordów.
Atrybuty w tej tabeli to:
| inventory_id | fig_num | quantity | |
|---|---|---|---|
| Min. : 3 | Length:20858 | Min. : 1.000 | |
| 1st Qu.: 7869 | Class :character | 1st Qu.: 1.000 | |
| Median : 15681 | Mode :character | Median : 1.000 | |
| Mean : 43010 | NA | Mean : 1.062 | |
| 3rd Qu.: 66834 | NA | 3rd Qu.: 1.000 | |
| Max. :194312 | NA | Max. :100.000 |
| inventory_id | fig_num | quantity |
|---|---|---|
| 3 | fig-001549 | 1 |
| 4 | fig-000764 | 1 |
| 19 | fig-000555 | 1 |
| 25 | fig-000574 | 1 |
| 26 | fig-000842 | 1 |
| 26 | fig-008641 | 1 |
| Name | inventory_minifigs |
| Number of rows | 20858 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| fig_num | 0 | 1 | 10 | 10 | 0 | 13455 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 43010.44 | 52256.78 | 3 | 7869 | 15681 | 66834 | 194312 | ▇▁▁▁▁ |
| quantity | 0 | 1 | 1.06 | 0.78 | 1 | 1 | 1 | 1 | 100 | ▇▁▁▁▁ |
W tabela minifigs znajdują się informacje o
figurkach LEGO.
Tabela zawiera 13764 rekordy.
Atrybuty w tej tabeli to:
| fig_num | name | num_parts | img_url | |
|---|---|---|---|---|
| Length:13764 | Length:13764 | Min. : 0.000 | Length:13764 | |
| Class :character | Class :character | 1st Qu.: 4.000 | Class :character | |
| Mode :character | Mode :character | Median : 4.000 | Mode :character | |
| NA | NA | Mean : 5.296 | NA | |
| NA | NA | 3rd Qu.: 5.000 | NA | |
| NA | NA | Max. :156.000 | NA |
| fig_num | name | num_parts | img_url |
|---|---|---|---|
| fig-000001 | Toy Store Employee | 4 | https://cdn.rebrickable.com/media/sets/fig-000001.jpg |
| fig-000002 | Customer Kid | 4 | https://cdn.rebrickable.com/media/sets/fig-000002.jpg |
| fig-000003 | Assassin Droid, White | 8 | https://cdn.rebrickable.com/media/sets/fig-000003.jpg |
| fig-000004 | Man, White Torso, Black Legs, Brown Hair | 4 | https://cdn.rebrickable.com/media/sets/fig-000004.jpg |
| fig-000005 | Captain America with Short Legs | 3 | https://cdn.rebrickable.com/media/sets/fig-000005.jpg |
| fig-000006 | Lloyd Avatar | 5 | https://cdn.rebrickable.com/media/sets/fig-000006.jpg |
| Name | minifigs |
| Number of rows | 13764 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| fig_num | 0 | 1 | 10 | 10 | 0 | 13764 | 0 |
| name | 0 | 1 | 1 | 148 | 0 | 13354 | 0 |
| img_url | 0 | 1 | 53 | 53 | 0 | 13764 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| num_parts | 0 | 1 | 5.3 | 6.03 | 0 | 4 | 4 | 5 | 156 | ▇▁▁▁▁ |
Tabela inventory_sets zawiera informacje o zapasach
(nakładzie) zestawów LEGO.
Tabela zawiera 4358
rekordów.
Atrybuty w tej tabeli to:
| inventory_id | set_num | quantity | |
|---|---|---|---|
| Min. : 35 | Length:4358 | Min. : 1.000 | |
| 1st Qu.: 8076 | Class :character | 1st Qu.: 1.000 | |
| Median : 16423 | Mode :character | Median : 1.000 | |
| Mean : 52519 | NA | Mean : 1.813 | |
| 3rd Qu.: 98685 | NA | 3rd Qu.: 1.000 | |
| Max. :191576 | NA | Max. :60.000 |
| inventory_id | set_num | quantity |
|---|---|---|
| 35 | 75911-1 | 1 |
| 35 | 75912-1 | 1 |
| 39 | 75048-1 | 1 |
| 39 | 75053-1 | 1 |
| 50 | 4515-1 | 1 |
| 50 | 4520-1 | 2 |
| Name | inventory_sets |
| Number of rows | 4358 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 5 | 20 | 0 | 3171 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 52518.95 | 59063.13 | 35 | 8076 | 16423 | 98685 | 191576 | ▇▁▁▂▁ |
| quantity | 0 | 1 | 1.81 | 5.67 | 1 | 1 | 1 | 1 | 60 | ▇▁▁▁▁ |
Tabela sets zawiera informacje o oficjalnych
kolorach klocków LEGO.
Tabela zawiera 21880
rekordów.
Atrybuty w tej tabeli to:
| set_num | name | year | theme_id | num_parts | img_url | |
|---|---|---|---|---|---|---|
| Length:21880 | Length:21880 | Min. :1949 | Min. : 1 | Min. : 0.0 | Length:21880 | |
| Class :character | Class :character | 1st Qu.:2001 | 1st Qu.:273 | 1st Qu.: 3.0 | Class :character | |
| Mode :character | Mode :character | Median :2012 | Median :497 | Median : 31.0 | Mode :character | |
| NA | NA | Mean :2008 | Mean :442 | Mean : 161.4 | NA | |
| NA | NA | 3rd Qu.:2018 | 3rd Qu.:608 | 3rd Qu.: 139.0 | NA | |
| NA | NA | Max. :2024 | Max. :752 | Max. :11695.0 | NA |
| set_num | name | year | theme_id | num_parts | img_url |
|---|---|---|---|---|---|
| 001-1 | Gears | 1965 | 1 | 43 | https://cdn.rebrickable.com/media/sets/001-1.jpg |
| 0011-2 | Town Mini-Figures | 1979 | 67 | 12 | https://cdn.rebrickable.com/media/sets/0011-2.jpg |
| 0011-3 | Castle 2 for 1 Bonus Offer | 1987 | 199 | 0 | https://cdn.rebrickable.com/media/sets/0011-3.jpg |
| 0012-1 | Space Mini-Figures | 1979 | 143 | 12 | https://cdn.rebrickable.com/media/sets/0012-1.jpg |
| 0013-1 | Space Mini-Figures | 1979 | 143 | 12 | https://cdn.rebrickable.com/media/sets/0013-1.jpg |
| 0014-1 | Space Mini-Figures | 1979 | 143 | 2 | https://cdn.rebrickable.com/media/sets/0014-1.jpg |
| Name | sets |
| Number of rows | 21880 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 3 | 20 | 0 | 21880 | 0 |
| name | 0 | 1 | 2 | 93 | 0 | 18752 | 0 |
| img_url | 0 | 1 | 46 | 63 | 0 | 21880 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2007.76 | 13.96 | 1949 | 2001 | 2012 | 2018 | 2024 | ▁▁▁▃▇ |
| theme_id | 0 | 1 | 441.97 | 215.53 | 1 | 273 | 497 | 608 | 752 | ▃▃▃▇▇ |
| num_parts | 0 | 1 | 161.38 | 418.14 | 0 | 3 | 31 | 139 | 11695 | ▇▁▁▁▁ |
W tej tabeli zawarte zostały informacje o oryginalnych kategoriach
(tematykach) zestawów, jak i o współpracach np. (Lego Star Wars).
Tabela zawiera 468 rekordów.
Atrybuty w tej tabeli to:
| id | name | parent_id | |
|---|---|---|---|
| Min. : 1.0 | Length:468 | Min. : 1.0 | |
| 1st Qu.:250.5 | Class :character | 1st Qu.:186.0 | |
| Median :466.0 | Mode :character | Median :411.0 | |
| Mean :433.5 | NA | Mean :360.6 | |
| 3rd Qu.:625.2 | NA | 3rd Qu.:512.5 | |
| Max. :752.0 | NA | Max. :697.0 | |
| NA | NA | NA’s :145 |
| id | name | parent_id |
|---|---|---|
| 1 | Technic | NA |
| 3 | Competition | 1 |
| 4 | Expert Builder | 1 |
| 16 | RoboRiders | 1 |
| 17 | Speed Slammers | 1 |
| 18 | Star Wars | 1 |
| Name | themes |
| Number of rows | 468 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 2 | 42 | 0 | 385 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 433.46 | 216.55 | 1 | 250.5 | 466 | 625.25 | 752 | ▅▅▅▆▇ |
| parent_id | 145 | 0.69 | 360.64 | 197.19 | 1 | 186.0 | 411 | 512.50 | 697 | ▅▃▂▇▂ |
W tej sekcji dokonano analizy wczytanych danych w celu wykrycia ciekawych zależności oraz znalezienia pojawiających się na przestrzeni lat trendów dotyczących klocków LEGO.
Poniższy wykres przedstawia 10 najczęściej występujących figurek w zestawach LEGO.
inventory_minifigs %>%
merge(inventories, by.x="inventory_id", by.y="id") %>%
merge(minifigs, by="fig_num") %>%
group_by(fig_num, name) %>%
summarize(fig_count = n()) %>%
arrange(desc(fig_count)) %>%
mutate(name = strtrim(name, 20)) %>%
head(10) %>%
ggplot(aes(x = reorder(name, fig_count), y = fig_count)) +
geom_bar(stat="identity", fill="#00abff") +
coord_flip() +
ggtitle("10 najpopularniejszych figurek") +
labs(x = "Figurka", y = "Liczba zestawów, w których występuje figurka") +
theme_bw() +
easy_center_title() -> pop_fig
ggplotly(pop_fig)
Z wykresu wynika, że najbardziej popularną figurką jest figurka szkieleta, która występuje w 43 zestawach.
Poniżej znajduje się wykres kołowy oraz tabela podsumowująca procentowy rozkład materiałów, z których wykonano części LEGO. Niemalże wszystkie z nich (aż około 98,88%) zostały wykonane z plastiku. Następnym najczęściej używanym materiałem jest guma (około 0.71%).
parts %>%
merge(inventory_parts, by="part_num") %>%
group_by(part_material) %>%
summarize(material_count = n()) %>%
arrange(desc(material_count)) %>%
mutate(prop = material_count / sum(material_count)) -> parts_by_material
parts_by_material %>%
mutate(ypos = cumsum(prop) - 0.5 * prop) %>%
ggplot(aes(x="", y=prop, fill=part_material)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
theme_void()
parts_by_material %>%
mutate(prop = formattable::percent(prop)) %>%
kable(col.names = c("Materiał", "Liczba części", "Procent"))
| Materiał | Liczba części | Procent |
|---|---|---|
| Plastic | 1167738 | 98.88% |
| Rubber | 8440 | 0.71% |
| Cardboard/Paper | 2367 | 0.20% |
| Cloth | 2029 | 0.17% |
| Flexible Plastic | 245 | 0.02% |
| Metal | 101 | 0.01% |
| Foam | 67 | 0.01% |
Poniżej znajduje się wykres przedstawiający 10 tematyk (motywów), w których jest najwięcej zestawów.
themes %>%
rename(theme_name = name) %>%
merge(sets, by.x="id", by.y="theme_id") %>%
group_by(theme_name) %>%
summarize(theme_count = n()) %>%
arrange(desc(theme_count)) %>%
head(10) %>%
ggplot(aes(x = reorder(theme_name, theme_count), y = theme_count)) +
geom_bar(stat="identity", fill="#00abff") +
coord_flip() +
ggtitle("10 najpopularniejszych kategorii zestawów") +
labs(x = "Kategoria zestawu", y = "Liczba zestawów w danej kategorii") +
theme_bw() +
easy_center_title() -> pop_themes
ggplotly(pop_themes)
Z powyższego wykresu wynika, że najpopularniejszą tematyką (motywem) są Books, gdyż w tej kategorii są 973 zestawy.
Następna analiza dotyczy wielkości zestawów w zbiorze danych. Poniżej znajduje się wykres i tabela prezentująca 10 najbardziej złożonych zestawów LEGO ze zbioru Rebrickable.
sets %>%
arrange(desc(num_parts)) %>%
select(name, img_url, num_parts) %>%
mutate(img_url = paste0("<","img src=",img_url," height=120 width=100>")) %>%
head(10) -> big_sets
big_sets %>%
ggplot(aes(x = reorder(name, num_parts), y = num_parts)) +
geom_bar(stat="identity", fill="#00abff") +
coord_flip() +
ggtitle("10 największych zestawów") +
labs(x = "Zestaw", y = "Liczba części w zestawie") +
theme_bw() +
easy_center_title() -> big_sets_fig
ggplotly(big_sets_fig)
kable(big_sets, col.names = c("Nazwa", "", "Liczba części"))
| Nazwa | Liczba części | |
|---|---|---|
| World Map | 11695 | |
| Eiffel Tower | 10001 | |
| The Ultimate Battle for Chima | 9987 | |
| Titanic | 9092 | |
| Colosseum | 9036 | |
| Millennium Falcon | 7541 | |
| AT-AT | 6785 | |
| The Razor Crest | 6194 | |
| Lord of the Rings: Rivendell | 6182 | |
| NINJAGO City Markets | 6163 |
Największym zestawem LEGO w podanym zbiorze danych jest World Map, który składa się z dokładnie 11695 części.
W podanym zbiorze danym warto również było dokonać analizy popularności kolorów. Poniżej znajduje się wykres przedstawiający 10 najpopularniejszych kolorów części.
colors %>%
rename(color_name = name) %>%
merge(elements, by.x="id", by.y="color_id") %>%
merge(parts, by.x="part_num", by.y="part_num") %>%
group_by(rgb, color_name) %>%
summarize(color_count = n()) %>%
arrange(desc(color_count)) %>%
mutate(rgb = paste0("#",rgb)) %>%
head(10) %>%
ggplot(aes(x = reorder(color_name, color_count), y = color_count, fill=rgb, color="black")) +
geom_bar(stat="identity") +
coord_flip() +
scale_fill_identity() +
scale_color_identity() +
ggtitle("10 najpopularniejszych kolorów części") +
labs(x = "Kolor", y = "Liczba części") +
theme_bw() + theme(legend.position = "none") +
easy_center_title() -> pop_color
ggplotly(pop_color)
Z powyższego wykresu wynika, że najpopularniejeszymi kolorami części LEGO są czarny i biały.
Poniżej znajduje się wykres przedstawiający 10 najczęściej występujących kategorii (rodzajów) części.
parts %>%
rename(part_name = name) %>%
merge(part_categories, by.x="part_cat_id", by.y="id") %>%
group_by(part_cat_id, name) %>%
summarize(part_count = n()) %>%
arrange(desc(part_count)) %>%
head(10) %>%
ggplot(aes(x = reorder(name, part_count), y = part_count)) +
geom_bar(stat="identity", fill="#00abff") +
coord_flip() +
ggtitle("10 najpopularniejszych kategorii części") +
labs(x = "Kategoria", y = "Liczba części należących do kategorii") +
theme_bw() +
easy_center_title() -> pop_cat_fig
ggplotly(pop_cat_fig)
Z powyższego wykresu wynika, że najliczniejszą kategorią jest Minifig Upper Body, do której należy 6329 części.
Kolejna analiza dotyczy popularności części w zestawach. Poniżej znajduje się wykres oraz tabela przedstawiająca 10 najpopularniejszych części w zestawach LEGO.
parts %>%
merge(inventory_parts, by.x="part_num", by.y="part_num") %>%
merge(inventories, by.x="inventory_id", by.y="id") %>%
group_by(part_num, name, img_url, color_id) %>%
summarize(part_count = n()) %>%
mutate(img_url = paste0("<","img src=",img_url," height=100 width=100>")) %>%
select(part_num, name, img_url, part_count, color_id) %>%
arrange(desc(part_count)) %>%
head(10) -> pop_parts
pop_parts %>%
mutate(name = strtrim(name, 20)) %>%
ggplot(aes(x = reorder(paste0(part_num, ": ", name, " (", color_id, ")"), part_count), y = part_count, fill=)) +
geom_bar(stat="identity", fill="#00abff") +
coord_flip() +
scale_color_identity() +
ggtitle("10 najpopularniejszych części w zestawach") +
labs(x = "Część", y = "Liczba wystąpięń części") +
theme_bw() +
easy_center_title() -> pop_parts_fig
ggplotly(pop_parts_fig)
pop_parts %>%
select(part_num, name, img_url, part_count) %>%
kable(col.names = c("Nr części", "Nazwa części", "", "Łączna liczba wystąpień części"))
| Nr części | Nazwa części | Łączna liczba wystąpień części | |
|---|---|---|---|
| 2780 | Technic Pin with Friction Ridges Lengthwise and Center Slots | 4769 | |
| 6141 | Plate Round 1 x 1 with Solid Stud | 2922 | |
| 3023 | Plate 1 x 2 | 2630 | |
| 6141 | Plate Round 1 x 1 with Solid Stud | 2371 | |
| 3673 | Technic Pin without Friction Ridges Lengthwise | 2312 | |
| 43093 | Technic Axle Pin with Friction Ridges Lengthwise | 2264 | |
| 3023 | Plate 1 x 2 | 2256 | |
| 4274 | Technic Pin 1/2 | 2237 | |
| 3022 | Plate 2 x 2 | 2098 | |
| 3020 | Plate 2 x 4 | 2092 |
Najczęściej występującą częścią w zestawach jest Technic Pin with Friction Ridges Lengthwise and Center Slots, który znalazł sie łącznie 4769 razy we wszystkich zestawach LEGO.